Search results for "text classification"

showing 7 items of 7 documents

T100: A modern classic ensemble to profile irony and stereotype spreaders

2022

In this work we propose a novel ensemble model based on deep learning and non-deep learning classifiers. The proposed model was developed by our team for participating at the Profiling Irony and Stereotype Spreaders (ISSs) task hosted at PAN@CLEF2022. Our ensemble (named T100), include a Logistic Regressor (LR) that classifies an author as ISS or not (nISS) considering the predictions provided by a first stage of classifiers. All these classifiers are able to reach state-of-the-art results on several text classification tasks. These classifiers (namely, the voters) are a Convolutional Neural Network (CNN), a Support Vector Machine (SVM), a Decision Tree (DT) and a Naive Bayes (NB) classifie…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore ING-INF/03 - Telecomunicazioniirony stereotypes author profiling text classification Twitter ensemble logistic regressor
researchProduct

A Layered Architecture for Sentiment Classification of Products Reviews in Italian Language

2017

The paper illustrates a system for the automatic classification of the sentiment orientation expressed into reviews written in Italian language. A proper stratification of linguistic resources is adopted in order to solve the lacking of an opinion lexicon specifically suited for the Italian language. Experiments show that the proposed system can be applied to a wide range of domains.

Sentiment analysis Text Classification of ReviewsSettore INF/01 - InformaticaComputer scienceOrientation (computer vision)business.industryMultitier architectureItalian languageSentiment analysis02 engineering and technologyLexiconcomputer.software_genreRange (mathematics)020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerNatural language processing
researchProduct

An SVM Ensamble Approach to Detect Irony and Stereotype Spreaders on Twitter

2022

The problem we address in this work is classifying whether a Twitter user has spread Irony and Stereotype or not. We used a text vectorization layer to generate Bag-Of-Words sequences. Then such sequences are passed to three different text classifiers (Decision Tree, Convolutional Neural Network, Naive Bayes). Our final classifier is an SVM. To test and validate our approach we used the dataset provided for the author profiling task organized by PAN@CLEF 2022. Our team (missino) submitted the predictions on the provided test set to participate at the shared task. Over several cross fold validation our approach was able to reach a maximum binary accuracy on the best validation split equal to…

author profiling ensamble irony PAN2022 stereotype SVM text classificationSettore ING-INF/03 - Telecomunicazioni
researchProduct

Multitask deep learning for native language identification

2020

Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individua…

luonnollinen kieliComputingMethodologies_PATTERNRECOGNITIONtext classificationkoneoppiminentekstinlouhintadeep learningäidinkielinatural language processingenglannin kielimultitask learning
researchProduct

Improving Irony and Stereotype Spreaders Detection using Data Augmentation and Convolutional Neural Network

2022

In this paper we describe a deep learning model based on a Data Augmentation (DA) layer followed by a Convolutional Neural Network (CNN). The proposed model was developed by our team for the Profiling Irony and Stereotype Spreaders (ISSs) task proposed by the PAN 2022 organizers. As a first step, to classify an author as ISS or not (nISS), we developed a DA layer that expands each sample in the dataset provided. Using this augmented dataset we trained the CNN. Then, to submit our predictions, we apply our DA layer on the samples within the unlabeled test set too. Finally we fed our trained CNN with the augmented test set to generate our final predictions. To develop and test our model we us…

Settore ING-INF/03 - Telecomunicazioniauthor profiling convolutional neural network data augmentation irony stereotypes text classification Twitter
researchProduct

Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

2022

Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN,…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazionitext classificationcorpus linguisticSettore ING-INF/03 - Telecomunicazionifake newTwitterauthor profilingconvolutional neural networkdeep learningNatural Language Processing (NLP)user classificationfake news; misinformation; Natural Language Processing (NLP); transformers; Twitter; convolutional neural networks; text classification; deep learning; machine learning; user classification; author profiling; corpus linguistics; linguistic analysismachine learningtransformermisinformationlinguistic analysisInformation Systems
researchProduct

Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

2012

Purpose: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. Methods: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to class…

text classificationTechnology Assessment BiomedicalDatabases FactualComputer scienceCost-Benefit AnalysisReview Literature as TopicHardware_PERFORMANCEANDRELIABILITYEmpirical Researchcomputer.software_genre03 medical and health sciences0302 clinical medicineMeta-Analysis as TopicAlzheimer DiseaseHardware_INTEGRATEDCIRCUITSData MiningHumanssupport vector machineOriginal Research Article030212 general & internal medicineGenetics (clinical)030304 developmental biologyGenetics0303 health sciencesParkinson DiseasePipeline (software)3. Good healthmeta-analysisReview Literature as Topicmachine learningSchizophreniaData miningPeriodicals as Topiccomputercitation screeningSoftwareGenetics in Medicine
researchProduct